clear all

import dbase using "../../data/raw data/1967床面積と建築年数町別/hiroshima67_floorspace_points2.dbf"
merge 1:1 町名 using "../../data/raw data/1967床面積と建築年数町別/hiroshima67_area_h.dta"
drop _merge

rename distance distance_epicenter
* CBD distance
rename distance_2 distance

* Calculate the amount of new construction in each year interval (area in km^2)
replace area = area/1000000
gen LnArea = ln(area)

* habitable area (excluding forests)
gen area_h = area_2/1000000
gen LnArea_h = ln(area_h)


* Floor space (transformed from tsubo to km2: 1 tsubo = 3.30579e-6 km2). 
* Depreciation rate of buildings 0.95 is assumed for 5 years. We need to compute the number of new construction while what we observe is the t-year-old floor space, which is net of depreciation. observed stock/0.95^#periods is the new construction #periods ago.
* (ln (observed stock/0.95^#periods) = ln(observed stock)-#periods*ln(0.95))
replace Total1 = Total1 *0.00000331
replace Total6 = Total6 *0.00000331
replace Total11 = Total11 *0.00000331 / 0.95^1
replace Total16 = Total16 *0.00000331 / 0.95^2
replace Total21 = Total21 *0.00000331 / 0.95^3
replace Total26 = Total26 *0.00000331 / 0.95^4
replace Total30 = Total30 *0.00000331 / 0.95^5
* gen Built prior to 1945
gen Total26over = Total26 + Total30

* Built in 1966 (multiplied by 5 to make it comparable with other years)
gen LnTotal_1 = ln(5*Total1)
* Built in 1961-1965
gen LnTotal_6 = ln(Total6)
* Built in 1956-1960
gen LnTotal_11 = ln(Total11)
* Built in 1951-1955
gen LnTotal_16 = ln(Total16)
* Built in 1946-1950
gen LnTotal_21 = ln(Total21)

* Log housing price using geographical area
* We assume the housing supply elasticity 4 (https://www.jstage.jst.go.jp/article/jscejipm/77/5/77_I_243/_article/-char/ja/)
gen lnQ_66 = (1/4) * (LnTotal_1  - LnArea)
gen lnQ_65 = (1/4) * (LnTotal_6 - LnArea)
gen lnQ_60 = (1/4) * (LnTotal_11  - LnArea)
gen lnQ_55 = (1/4) * (LnTotal_16  - LnArea)
gen lnQ_50 = (1/4) * (LnTotal_21  - LnArea)

* Habitable area version
gen lnQ_66_h = (1/4) * (LnTotal_1  - LnArea_h)
gen lnQ_65_h = (1/4) * (LnTotal_6   - LnArea_h)
gen lnQ_60_h = (1/4) * (LnTotal_11- LnArea_h)
gen lnQ_55_h = (1/4) * (LnTotal_16 - LnArea_h)
gen lnQ_50_h = (1/4) * (LnTotal_21 - LnArea_h)

* Merge the observed land price data in 1975
merge 1:1 町名 using "../../data/raw data/landprice/Hiroshima75_landprice_blocks.dta"
drop _merge

* Highest floor space price. Quite close to the highest land price point
gsort -lnQ_55
gsort -lnQ_60
gsort -lnQ_65

* Analyze the persistence in the error term
* Use the approximation ln(1+g) = g
gen E_65 = lnQ_65-lnQ_60
gen E_60 = lnQ_60-lnQ_55
gen E_55 = lnQ_55-lnQ_50

reg E_65 E_60
reg E_65 E_55
reg E_60 E_55


gen E55_original=E_55
gen E60_original=E_60
gen E65_original=E_65

* Save the floor price series


* Predict 1970 growth rate
* The growth rate is assumed to follow AR2 process
reg E_65 E_60 E_55

replace E_55=E_60
replace E_60=E_65
predict E_70, xb
replace E_55=E_65
replace E_60=E_70
predict E_75, xb

* Construct predicted floor space prices using the above growth ratio
* Again use the approximation ln(1+g)Q_{t-1} = lnQ_{t-1} + ln(1+g) =  lnQ_{t-1} + g
gen lnQ_70 = lnQ_65 + E_70
gen lnQ_75 = lnQ_70 + E_75


** Observed price. 
gen lp75_data = ln(lp_avg)

* The coefficient corresponds to the input share of land 
* This also informs the elasticity of housing supply (but measurement error of land price will bias beta to zero)
reg lnQ_75 lp75_data
reg lnQ_75 lp75_data if distance<6000
reg lnQ_75 lp75_data if distance<3000

sum lnQ_75, d
sum lp75_data if lp75_data!=., d

** Validation with predictions by AR1 model (main specification uses AR2)
* AR1 prediction is generated by AR1prediction.do
merge 1:1 町名 using "../../data/raw data/1967床面積と建築年数町別\AR1prediction.dta"

gen lnQ_70_AR1 = lnQ_65 + E70_AR1
gen lnQ_75_AR1 = lnQ_70_AR1 + E75_AR1

reg lnQ_70 lnQ_70_AR1
reg lnQ_75 lnQ_75_AR1


** Checking whether the ratio of floor space supply is consistent with the price ratio
gen q_50 = exp(lnQ_50)
gen q_55 = exp(lnQ_55)
gen q_60 = exp(lnQ_60)
gen q_65 = exp(lnQ_65)
gen q_70 = exp(lnQ_70)
gen q_75 = exp(lnQ_75)

gen g55 = (q_55/q_50)^4 - (Total16/Total21)
gen g60 = (q_60/q_55)^4 - (Total11/Total16)
gen g65 = (q_65/q_60)^4 - (Total6/Total11)

sum g55 g60 g65, d

* Validation of predicted floor space price, using aggregated spatial units 
* Unit: 20 grids by distance from CBD
keep if distance<6000
xtile cbddist_d2 = distance, nquantiles (20)
collapse (mean) lnQ_75 lp75_data, by(cbddist_d2)

* For comparison, normalize both floor space price and land price to mean zero
egen lnQ_75_mean = mean(lnQ_75)
replace lnQ_75 = lnQ_75 - lnQ_75_mean 
egen lp75_data_mean = mean(lp75_data)
replace lp75_data = lp75_data - lp75_data_mean 

reg lnQ_75 lp75_data
test lp75_data=0.2
predict resid_v, residual
corr lnQ_75 lp75_data

** Figure A.9 in paper
#d;
twoway (scatter lnQ_75 lp75_data, sort color(black) msize(small)) 
(lfit lnQ_75 lp75_data, sort color(cranberry) lwidth(medthick)),
	scheme(s1color) plotregion(lwidth(none) ilwidth(none))
    legend(off)
	xtitle("Observed log land price in 1975")
	ytitle("Predicted log floor space price in 1975") ylabel(, angle(horizontal));
	graph export "../../output/figure/prediction_validation_CBD20grids.pdf", as(pdf)   replace ;
#d cr
